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We describe several optimizations which can be employed in a dynamic binary translation 
(DBT) system, where low compilation/translation overhead is essential. These 
optimizations achieve a high degree of ILP, sometimes even surpassing a static compiler 
employing more sophisticated, and more time-consuming algorithms [9]. We present 
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Achieving good performance in bytecoded language interpreters is difficult without 
sacrificing both simplicity and portability. This is due to the complexity of dynamic 
translation ("just-in-time compilation") of bytecodes into native code, which is the 
mechanism employed universally by high-performance interpreters. We demonstrate that 
a few simple techniques make it possible to create highly-portable dynamic translators 
that can attain as much as 70% the performance of optimized C for certain n 
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Dynamic binary translation is the process of translating and optimizing executable code 
for one machine to another at runtime, while the program is "executing" on the target 
machine. 

Dynamic translation techniques have normally been limited to two particular machines; a 
competitor's machine and the hardware manufacturer's machine. This research provides 
for a more general framework for dynamic translations, by providing a framework based 
on specifications of machines that ... 
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We describe the design and implementation of Dynamo, a software dynamic optimization 
system that is capable of transparently improving the performance of a native instruction 
stream as it executes on the processor. The input native instruction stream to Dynamo 
can be dynamically generated (by a JIT for example), or it can come from the execution 
of a statically compiled native binary. This paper evaluates the Dynamo system in the 
latter, more challenging situation, in order to emphasize the 




• • • 



Binary translation and architecture convergence issues for IBM system/390 

Michael Gschwind, Kemal Ebcioglu, Erik Altman, Sumedh Sathaye 

May 2000 Proceedings of the 14th international conference on Supercomputing 

Publisher: ACM Press 

Full text available: fiB pdf(1.44 MB) Additional Information: full citation , abstract , references , index terms 




We describe the design issues in an implementation of the ESA/390 architecture based on 
binary translation to a very long instruction word (VLIW) processor. During binary 
translation, complex ESA/390 instructions are decomposed into instruction "primitives" 
which are then scheduled onto a wide-issue machine. The aim is to achieve high 
instruction level parallelism due to the increased scheduling and optimization 
opportunities which can be exploited by binary translation software ... 

Increasing the size of atomic instruction blocks using control flow assertions 

Sanjay J. Patel, Tony Tung, Satarupa Bose, Matthew M. Crum 

December 2000 Proceedings of the 33rd annual ACM/IEEE international symposium 

on Microarchitecture 

Publisher: ACM Press 

Full text available: g]pdf(140.81 KB) 

ps(646.25 KB) Additional Information: full citation , references , citings , index terms 

Publisher Site 






Back to the future: the story of Squeak, a practical Smalltalk written in itself 

Dan Ingalls, Ted Kaehler, John Maloney, Scott Wallace, Alan Kay 

October 1997 ACM SIGPLAN Notices , Proceedings of the 12th ACM SIGPLAN 

conference on Object-oriented programming, systems, languages, and 

applications OOPSLA '97, volume 32 issue 10 




http://portal.acm.org/resultsxfm?CFID=60305010&CFTOK£N=6408340 11/14/2005 



Results (page 1): + M dynamic translation" ^optimization Page 3 of 6 

Publisher: ACM Press 

Full text available- IS pdf(1.28 MB) Additional Information: full citation , abstract , references , citings, index 

terms 

Squeak is an open, highly-portable Smalltalk implementation whose virtual machine is 
written entirely in Smalltalk, making it easy to. debug, analyze, and change. To achieve 
practical performance, a translator produces an equivalent C program whose performance 
is comparable to commercial Smalltalks. Other noteworthy aspects of Squeak include: a 
compact object format that typically requires only a single word of overhead per object; a 
simple yet efficient incremental garbage collector for 32-bit d 
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Maintaining precise exceptions is an important aspect of achieving full compatibility with a 
legacy architecture. While asynchronous exceptions can be deferred to an appropriate 
boundary in the code, synchronous exceptions must be taken when they occur. This 
introduces uncertainty into liveness analysis since processor state that is otherwise dead 
may be exposed when an exception handler is invoked. Previous systems either had to 
sacrifice full compatibility to achieve more freedom to perform op ... 
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For many applications, branch mispredictions and cache misses limit a processor's 
performance to a level well below its peak instruction throughput. A small fraction of 
static instructions, whose behavior cannot be anticipated using current branch predictors 
and caches, contribute a large fraction of such performance degrading events. This paper 
analyzes the dynamic instruction stream leading up to these performance degrading 
instructions to identify the operations necessary to exec 
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We have developed and implemented techniques that double the performance of 
dynamically-typed object-oriented languages. Our SELF implementation runs twice as fast 
as the fastest Smalltalk implementation, despite SELF'S lack of classes and explicit 
variables. To compensate for the absence of classes, our system uses implementation- 
level maps to transparently group objects cloned from the same prototype, providing data 
type information and eliminating the apparent ... 
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In modern processors, the dynamic translation of virtual addresses to support virtual 
memory is done before or in parallel with the first-level cache access. As processor 
technology improves at a rapid pace and the working sets of new applications grow 
insatiably the latency and bandwidth demands on the TLB (Translation Lookaside Buffer) 
are getting more and more difficult to meet. The situation is worse in multiprocessor 
systems, which run larger applications and are plagued by the TLB consiste ... 
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This paper describes Embra, a simulator for the processors, caches, and memory systems 
of uniprocessors and cache-coherent multiprocessors. When running as part of the SimOS 
simulation environment, Embra models the processors of a MIPS R3000/R4000 machine 
faithfully enough to run a commercial operating system and arbitrary user applications. To 
achieve high simulation speed, Embra uses dynamic binary translation to generate code 
sequences which simulate the workload. It is the first machine simu ... 

15 Efficient implementation of the smalltalk-80 system 
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January 1984 Proceedings of the 11th ACM SIGACT -SIGPLAN symposium on 

Principles of programming languages 

Publisher: ACM Press 

Full text available* fill pdf(595.22 KB) AdditionaI Information: full citation , abstract , references , citings, index 
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The Smalltalk-80* programming language includes dynamic storage allocation, full 
upward funargs, and universally polymorphic procedures; the Smalltalk-80 programming 
system features interactive execution with incremental compilation, and implementation 
portability. These features of modern programming systems are among the most difficult 
to implement efficiently, even individually. A new implementation of the Smalltalk-80 
system, hosted on a small microprocessor-based computer, achieves hig 
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Dynamically-typed object-oriented languages please programmers, but their lack of static 
type information penalizes performance. Our new implementation techniques extract 
static type information from declaration-free programs. Our system compiles several 
copies of a given procedure, each customized for one receiver type, so that the type of 
the receiver is bound at compile time. The compiler predicts types that are statically 
unknown but likely, and inserts ... 
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BrouHaHa is a portable implementation of the Smalltalk-80 virtual machine interpreter. It 
is a more efficient redesign of the standard Smalltalk specification, and is tailored to suit 
conventional 32 bit microprocessors. This paper presents the major design changes and 
optimization techniques used in the BrouHaHa interpreter. The interpreter runs at 30% of 
the speed of the Dorado on a Sun 3/160 workstation. The implementation is portable 
because it is written in C 
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